-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature/#397 scenario duplication #2373
base: develop
Are you sure you want to change the base?
Conversation
658f02f
to
2644479
Compare
☂️ Python Coverage
Overall Coverage
New FilesNo new covered files... Modified Files
|
2644479
to
f79d591
Compare
…an_duplicate functions
fe1826e
to
5b53c5f
Compare
self.__logger.error(f"Error uploading `{up_path.name}` to data " | ||
f"node `{self.id}`:") # type: ignore[attr-defined] | ||
self.__logger.error(f"Error uploading `{up_path.name}` to data " f"node `{self.id}`:") # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check your automatic formatter as this line is over 120 chars. Please revert the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there are still a few areas in the code that can be improve
taipy/core/data/excel.py
Outdated
def _duplicate_data(self): | ||
new_data_path = self._duplicate_data_file(self.id) | ||
if hasattr(self._properties, "_entity_owner"): | ||
del self._properties._entity_owner | ||
self._properties[self._PATH_KEY] = new_data_path | ||
return new_data_path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this repetitive?
Can we put this in the parent _FileDataNodeMixin
class and override it when necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm I think it's a yes and no answer. The self._properties
attribute doesn't exist in _FileDataNodeMixin
, but since ExcelDN, etc. are implementing _FileDataNodeMixin
, it can access it. But the syntax feels off for me, and since _FileDataNodeMixin
is a Mixin, I'm not comfortable putting self._properties
in it. (I understand that we do have similar things for def path(self, value)
(setter) in _FileDataNodeMixin
, but I'm still not sure if it's the best way to do it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your point. However, other methods in the Mixin class doesn't seem to fit as well.
I think we did discuss this once. @jrobinAV Let us know what you think about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is debatable. I would personally do the same as the path setter for consistency.
if base_name.startswith(self.__TAIPY_CLONED_PREFIX): | ||
base_name = "".join(base_name.split("_")[5:]) | ||
new_base_path = os.path.join(folder_path, f"{self.__TAIPY_CLONED_PREFIX}_{id}_{base_name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have 2 questions.
What would be the new file name if a data node is duplicated twice?
Similarly, if a duplicated data node is duplicated (dn -> duplicated_dn -> duplicated duplicated_dn)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep so take example.csv
as the file name, after you first duplicate the dn, it will create a new file TAIPY_CLONED_new_dn_id_example.csv
. But if you want to duplicate this newly duplicated dn, another file will be created with the name pattern being TAIPY_CLONED_another_new_dn_id_example.csv
. The TAIPY_CLONED
prefix won't be repeated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. So what about duplicating a data node twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we will have TAIPY_CLONED_dn_duplicated_id_1_example.csv
for the 1st dn, and for 2nd dn, we will have TAIPY_CLONED_dn_duplicated_id_2_example.csv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe having both the prefix TAIPY_CLONED
and the unique ID in the name is not necessary. If we can make the file name unique with the new ID, what is the purpose of keeping the prefix? We can probably propose something slightly better differentiating the case where the file name is generated by taipy or if it is provided by the user.
If the file of the initial dn is generated (dn.is_generated
), we can simply generate a new name for the duplicate with the same function (dn._build_path
). Otherwise, we can use your proposal without the prefix: {new_id}_{base_name}
.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well because I think the prefix will make the naming clear as to what it is, just the id is fine but we will have to state clearly in the doc that this is what we generated. While they can understand it just from the prefix without doc.
if base_name.startswith(self.__TAIPY_CLONED_PREFIX): | ||
base_name = "".join(base_name.split("_")[5:]) | ||
new_base_path = os.path.join(folder_path, f"{self.__TAIPY_CLONED_PREFIX}_{id}_{base_name}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe having both the prefix TAIPY_CLONED
and the unique ID in the name is not necessary. If we can make the file name unique with the new ID, what is the purpose of keeping the prefix? We can probably propose something slightly better differentiating the case where the file name is generated by taipy or if it is provided by the user.
If the file of the initial dn is generated (dn.is_generated
), we can simply generate a new name for the duplicate with the same function (dn._build_path
). Otherwise, we can use your proposal without the prefix: {new_id}_{base_name}
.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I know what's bugging me.
It's a matter of responsibility.
In short, I believe the data duplication should not be the responsibility of the data node.
In my mind, the data node class only knows itself and does not have a global view of all the data nodes, or scenarios. Duplicating data should be the responsibility of classes that have an overview of the situation. Either the managers or new classes dedicated to the data duplication.
Give me some time to propose something else
reasons._add_reason(self.id, DataNodeEditInProgress(self.id)) # type: ignore[attr-defined] | ||
return reasons | ||
|
||
up_path = pathlib.Path(path) | ||
try: | ||
upload_data = self._read_from_path(str(up_path)) | ||
except Exception as err: | ||
self.__logger.error(f"Error uploading `{up_path.name}` to data " | ||
f"node `{self.id}`:") # type: ignore[attr-defined] | ||
self.__logger.error(f"Error uploading `{up_path.name}` to data " f"node `{self.id}`:") # type: ignore[attr-defined] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.__logger.error(f"Error uploading `{up_path.name}` to data " f"node `{self.id}`:") # type: ignore[attr-defined] | |
self.__logger.error(f"Error uploading `{up_path.name}` to data " | |
f"node `{self.id}`:") # type: ignore[attr-defined] |
What type of PR is this? (check all applicable)
Description
Related Tickets & Documents
#397